LtU Forum, Site Discussion

Types and reflection

In my day-job as a Java programmer I use a lot of tools that relies heavily on reflection, and I've come up with quite a few uses of reflection that can simplify my day job. Having also become quite fond of OCaml and it's powerful type systems I've started wondering if combining reflection with a powerful type system is possible. The two features seem quite at odds with each other, reflection completely undermines the type system, something I also see every day in my day-job.

Has anyone looked at ways of combining the power of these two language features? In OCaml I'll have to resort to something like camlp4 if I want to do the stuff I use reflection for in Java. But it's seems to me that there might be a middle ground between syntactic extension and the metaprogramming allowed by reflection. Or is there some fundamental reason why this is impossible? As you probably understand I really don't have any clue what this is called, or if it exists, or if it's useful, so I'm curious about anything that might shed some light on it.

Continuations from Generalized Stack Inspection

... we show how to use our new technique to copy and reconstitute the stack on MSIL.Net using exception handlers. This establishes that Scheme’s first-class continuations can exist on non-cooperative virtual machines.

Continuations from Generalized Stack Inspection (pdf)

Practical: Designing a graph matching language.

I've written a graph pattern-matcher (Common Lisp), and I was wondering if anyone could offer comments regarding primitives or operators to modify or add. Mostly, I tried to extend regexes to graphes (or any data structure that can contain references), and make it easier to extend the language. Maybe I should have posted this on my own webspace and pasted a link. If it is the case, simply tell me and I'll fix that.

Ob-Grocery list of features: circularity-aware (for the graph that is to be matched, not the pattern against which to match - circular code is wrong ;), not biased towards any data structure, easily extensible and, of course, a dreaded sexp-based syntax.

The core of the language is:

  • alt: like the | operator in regexes, it provides ways to define failover patterns. (alt f1 f2 f3 ...) tries to match the corresponding subgraph against f1, and, if it fails, tries f2, f3, and so on. (alt) fails automatically.
  • and: Succeeds if all of its subclauses match against the subgraph (the subclauses are matched in left to right order). Fails as soon as any of the subclauses fail.
  • access: (access accessor pattern) Matches the pattern against the result of applying accessor (any common-lisp function) on the corresponding graph node. Backtracks gracefully when the accessor throws an error.
  • pred: (pred predicate) Succeeds if the predicate (a full-blown Common Lisp function) returns true on the graph node, fails otherwise.
  • Digression: I also allow a limited form of state in the presence of matching. Basically (so far), I can bind matches to a name, and each bindings only succeeds if all the matches bound to a given name are all deeply equal. For that reason, pred [when its optional second argument is true] can also fail or succeed based on the result of predicates on both the graph node and the names bound (with the subgraphs to which they are bound) so far. I think this could be comparable to one-way unification.

    bindpred: (bindpred predicate). Succeeds or fails based on the result of a predicate on the current graph node and the names bound so far. If the predicate succeeds, its second return value is the updated alist of bindings.

And that's it for the primitives. Lazy, one-level deep, macroexpansion (to allow recursive macros) does the rest.

Other operators are defined in terms of the previously described primitives and macroexpansion.

  • atom: (atom) matches any atom. (atom foo) matches any atom that is equal to foo.
  • any: (any) matches anything. (any type) matches anything that is of that type.
  • bind: (bind name pattern) tries to bind "name" to the graph node if "pattern" matches against the node. It expands to:
    (and pattern
         (%bind-now name))
    where %bind-now expands into a bindpred clause. %bind-now does the "tries to bind 'name' to the graph node" part of bind's job description. If there is no previous bindings to "name" or it is deeply equal to the current graph node, it succeeds (and saves a bindings from name to the graph node if needed); otherwise, it fails.
  • rep: The star of the show, and the only recursive construct so far. rep is analogous to the (kleene?) star in regexes.

    (rep pattern pattern-final &rest accessors) [everything after pattern-final is kept in a list of accessors] matches a graph node against "pattern," and if that is successful, all of that node's children (those that are accessed through the accessors) against "pattern," and so on. When "pattern" doesn't match, it still succeeds if "pattern-final" matches against the same graph node and fails otherwise. rep is implemented purely in terms of macroexpansion: (rep pattern pattern-final accessor1 accessor2 ...) expands into:

    (alt (and pattern
             (access accessor1 (rep pattern pattern-final accessor1 accessor2 ...))
             (access accessor2 (rep pattern pattern-final accessor1 accessor2 ...))
             ...)
         pattern-final)
    
    Note that the expansion is recursive, which is why macroexpansion is lazy.

  • cons: The only datastructure-specific operator. (cons &key (car-clause '(any)) (cdr-clause '(any)))

    Matches any cons whose car matches car-clause (which defaults to (any), which matches anything) and whose cdr matches cdr-clause (same default). It expands into:

    (and (pred consp)            ;;consp is a predefined predicate
         (access car car-clause)
         (access cdr cdr-clause))
    (with the appropriate defaults)

Example usage:

To match any combination of cons with only 2 atoms in it:

(rep (cons)
     (alt (bind a (atom))
          (bind b (atom)))
     car cdr)
It traverses conses' car and cdr, matching any cons. When the current node isn't a cons, it saves the corresponding atom under the names a or b. If "a" is already bound to something that isn't equal to the current node, it tries to bind it to "b". If, again, "b" is already bound to something that isn't equal to the current node, the matching fails. Thus, it can only match up to two different non-cons atoms (I think there is some redundance here ;). Note that when it does match successfully, the bindings' alist is returned as a second return value. No point in wasting all that work!

Now, it obviously wouldn't be very hard to build a recursive function to do the same job -- in fact, this is exactly what the matcher will do. However, making the same function work in the presence of circularity would make this already repetitive job a tad more tedious.

Comments, questions or suggestions (additional operators or another syntax, for example) would be appreciated.

Paul Khuong

EDIT: How _I_ expect to use this: I want to write a compiler that works, as much as possible, through graph rewriting. I hope the matcher can help a lot there.

EDIT2: Fixed the expansion for rep

Actual programs written in FP or FL?

I've become enamored with Backus's FP language (and the lesser known follow-up FL) as of late. Does anyone know of *any* programs written in either of these languages that are outside the 1-3 line toy variety? I've found nothing. Even 10+ line toy programs would be fine, but a larger program would be even better.

(For programs in a related style there's Iverson's J, but I'd still like to see some real FP programs.)

Workshop on Synchronization and Concurrency in OO languages

The workshop on Synchronization and Concurrency in Object-Oriented Languages has a nice, accessible collection of papers on software transactional memory and other language-based approaches to building concurrent systems.

Formal Frustration...

First off, I should qualify that I'm self-taught, and second that, budget permitting, LtU has been a major source of book purchases over the past few years(TaPL, CTM, EOPL,...). So, I've been working through these books in bit of a cyclical fashion(i.e. coming back to the parts I have no clue about and hoping they make sense this time...), and I avidly download and read papers that people link to from here(understanding them, of course, is entirely different matter).

Well, lately, I've developed a DSL for some of my work, and I think its all very innovative and unique, and I thought, "I should try and formalize the type system and semantics for this little language," and then I realized it: I have no clue.

I really just don't know what to do to formalize my language. Oh, I think I know the relevant concepts, I just don't know where to start.

Is there some book out there that would help with this(okay, I realize I may have already bought/read/skimmed said book, and densely missed the chapter, "How to Formalize Your DSL in 3 Easy Steps").

Forgive me if I just showed me ignorance or laziness, but this is what got me interested in programming language to start with(wanting to write down the formal semantics much the way I can write down the context free grammar), but this skill continue to elude me.

Perhaps my attempt to learn PLT by osmosis and waterfall method is not working out so great...

The breaking point of language usability?

The indefatigable Bruce Eckel is learning all about Java generics so he can write about them in a way that explains things to mere mortals. It is clear to me that the Average Joes who have been using Java are going to have their minds blown by such things, and I wonder if Java has taken a large step along the pirate ship plank off to C++-like complexity and confoundedness? Along this vein of thought, advanced functional languages don't get much use in industry and I think people attribute it party to their tougher learning curve.

So my question is, at what point have you made a language that regular folks simply won't be willing to learn? And the challenge is, how can languages be designed to give advanced benefits yet hide the complexities? Why can't machines better hide issues like co vs. contra variant type usage (or whatever)?

Are we missing out on fancy types?

The main Haskell and ML compilers integrate many extensions to Hindley-Milner, but many other extensions are left out: intersection types, basic (DML or ATS) dependant types, constraint-handling systems like HM(X), generic types, subtyping … Some of these are new, but some of them are older than other extensions that are getting integrated into mainstream compilers. What happens to these ideas after their toy compilers leave town?

Also, why do the main compiler projects not integrate work done in each other &mdash Why doesn't SML have type classes? Why doesn't GHC have polymorphic variants?

My guess for the first question is that the work necessary to code a new type inferencer is less than what it takes to integrate a new type inferencer into a pre-existing compiler. Pushed by grant or graduation deadlines, the simpler work is done. In the meantime, people like Simon Peyton-Jones beg for GHC hackers.

As for the second question, I suspect there's a little bit of NIH syndrome acting together with grant pressure &mdash row types in HM is already done in O'Caml, why would ICFP publish a paper about doing the same thing in Haskell?

Of course, perhaps the most valuable extensions are being integrated, or perhaps these things are a matter of chance, as interests and Ph.D. students come and go.

internship advice

I was hoping to get some advice about a dilemma I'm in. I'll be entering a Ph.D. program next fall (hopefully), and I'm already having trouble finding summer internships because few companies want to take on interns that 1. are just starting their research careers and 2. won't be available for full-time work for many years. Have any of you been in a similar situation or have any thoughts about it?

XML feed